1 00:00:00,409 --> 00:00:02,766 [ELECTRONIC MUSIC] 2 00:00:11,280 --> 00:00:12,653 Live Text Access 3 00:00:12,653 --> 00:00:15,920 Training for real-time intralingual subtitlers. 4 00:00:16,640 --> 00:00:18,666 Unit 2. Linguistic competence 5 00:00:18,666 --> 00:00:22,880 Element 1. Functionality: accuracy, readability and legibility 6 00:00:24,880 --> 00:00:30,634 Accuracy assessment models created by UAB, SSLM and SDI 7 00:00:30,634 --> 00:00:33,125 The learning outcomes for this video lecture 8 00:00:33,125 --> 00:00:35,306 are to produce accurate transcriptions 9 00:00:35,306 --> 00:00:37,561 in terms of spelling, grammar and meaning 10 00:00:37,561 --> 00:00:40,381 and to produce legible and readable transcriptions 11 00:00:40,381 --> 00:00:44,439 both while creating the transcriptions and after the live situation 12 00:00:44,439 --> 00:00:47,280 by applying readability and eligibility indicators 13 00:00:49,440 --> 00:00:50,672 Agenda 14 00:00:50,672 --> 00:00:55,594 in this video lecture we will be dealing  with the following points: 15 00:00:55,594 --> 00:00:57,200 From quantity to quality. 16 00:00:57,760 --> 00:01:01,697 Accuracy key areas. Accuracy assessment  models. 17 00:01:01,697 --> 00:01:03,760 and we will finish with a Summary 18 00:01:06,640 --> 00:01:08,723 From quantity to quality. 19 00:01:08,723 --> 00:01:12,240 according to the EFHOH 20 00:01:13,120 --> 00:01:17,112 "it is well and good having subtitles 21 00:01:17,112 --> 00:01:19,187 but if they are not accurate 22 00:01:19,187 --> 00:01:23,804 then they are of little or no use to the consumer". 23 00:01:23,804 --> 00:01:27,920 This is a particular problem with regards to live subtitling 24 00:01:29,040 --> 00:01:31,599 Accuracy key areas 25 00:01:31,599 --> 00:01:33,933 There are three key areas 26 00:01:33,933 --> 00:01:37,940 which will determine the accuracy of a real-time subtitle 27 00:01:37,940 --> 00:01:41,523 lexical, means correct pronunciation and/or spelling 28 00:01:41,523 --> 00:01:44,751 grammar, means correct syntactic structure 29 00:01:44,751 --> 00:01:45,785 of the unit 30 00:01:45,785 --> 00:01:50,566 and semantic, means that the meaning of the source text is conveyed. 31 00:01:50,566 --> 00:01:54,029 Errors in these elements can be due  to a human failure 32 00:01:54,029 --> 00:01:57,659 or due to a malfunctioning of the software 33 00:01:57,659 --> 00:01:59,835 and they may or may not affect 34 00:01:59,835 --> 00:02:05,188 the understanding of the meaning the speaker intends to deliver 35 00:02:05,188 --> 00:02:08,182 Accuracy assessment models 36 00:02:08,182 --> 00:02:12,261 to measure accuracy there are two main models 37 00:02:12,261 --> 00:02:17,067 the NER model based on the Word Error Rate System 38 00:02:17,067 --> 00:02:20,560 and the IRA model which is a conceptual system. 39 00:02:21,440 --> 00:02:24,189 The NER model is especially effective 40 00:02:24,189 --> 00:02:26,960 when measuring the accuracy of verbatim subtitles 41 00:02:27,520 --> 00:02:31,473 because it's based on a World Error Rate system 42 00:02:31,473 --> 00:02:37,052 in which all utterances of the speaker are respoken or typed. 43 00:02:37,052 --> 00:02:41,398 Therefore the source text from the speaker can be compared 44 00:02:41,398 --> 00:02:44,086 with the output text of the subtitles 45 00:02:44,611 --> 00:02:49,479 The NER model, developed by Romero-Fresco in 2011 46 00:02:49,479 --> 00:02:51,430 is a model to assess the accuracy 47 00:02:51,430 --> 00:02:54,706 by analysing the extent to which errors 48 00:02:54,706 --> 00:02:57,578 affect the coherence of the subtitle text 49 00:02:57,578 --> 00:02:59,440 or modify the content. 50 00:03:02,240 --> 00:03:06,655 In the NER model accuracy is measured following a formula 51 00:03:06,655 --> 00:03:12,265 in which N is the total number of words in the subtitles 52 00:03:12,265 --> 00:03:18,412 E are the addition errors and R the recognition errors. 53 00:03:18,412 --> 00:03:23,286 So N the total number of words of the subtitles 54 00:03:23,286 --> 00:03:27,629 minus E edition errors minus R recognition errors 55 00:03:27,629 --> 00:03:33,743 will be divided by N and um multiplied by 100 56 00:03:33,743 --> 00:03:37,120 This will give us an accuracy rate 57 00:03:37,120 --> 00:03:40,499 saying how accurate the subtitles are 58 00:03:40,499 --> 00:03:43,132 but this is only an indication of the quality. 59 00:03:43,132 --> 00:03:48,775 The final rate should not be taken as an end result. 60 00:03:49,069 --> 00:03:50,999 There is also a section 61 00:03:50,999 --> 00:03:54,050 where correct additions can be included 62 00:03:54,050 --> 00:03:56,410 and an assessment can be made. 63 00:03:56,410 --> 00:04:00,092 in any case the target should not be less than 98%. 64 00:04:00,880 --> 00:04:03,787 This is the established threshold  for subtitles 65 00:04:03,787 --> 00:04:06,721 that present an acceptable quality 66 00:04:06,721 --> 00:04:10,383 this model is being used and can be used 67 00:04:10,383 --> 00:04:14,651 in any type of subtitling techniques and working context 68 00:04:14,651 --> 00:04:18,221 as it is based on a word edition. 69 00:04:18,221 --> 00:04:23,543 The IRA model is the idea units rendition assessment model 70 00:04:23,543 --> 00:04:27,039 ideas are any sentence type  conveying a meaning 71 00:04:27,039 --> 00:04:30,630 regardless of its degree of synthetic dependency 72 00:04:30,630 --> 00:04:35,320 idea units are divided in rendered and non-rendered idea units 73 00:04:35,320 --> 00:04:39,995 rendered idea units are those represented in the output text 74 00:04:39,995 --> 00:04:44,716 while non-rendered units are those in which a message is not conveyed 75 00:04:44,716 --> 00:04:48,393 due to omission of the unit itself. 76 00:04:48,393 --> 00:04:52,000 Non-rendered idea units are omissions 77 00:04:52,560 --> 00:04:56,926 which occur when the software fails to reproduce a conceptual unit. 78 00:04:56,926 --> 00:05:00,913 Misrepresentations are due to semantic errors 79 00:05:00,913 --> 00:05:04,256 which compromise the decoding of the source text. 80 00:05:04,256 --> 00:05:06,720 Rendered units are repetitions 81 00:05:08,080 --> 00:05:11,676 which means units which are highly comprehensible 82 00:05:11,676 --> 00:05:14,832 due to its faithful verbatim reproduction of the words 83 00:05:14,832 --> 00:05:17,253 added in the search text 84 00:05:17,253 --> 00:05:22,139 and alterations which means that the unit contains minor errors 85 00:05:22,139 --> 00:05:27,257 which however do not deteriorate the  comprehensibility of the message 86 00:05:27,257 --> 00:05:29,823 for the end user 87 00:05:29,823 --> 00:05:33,809 there are two types of alterations  expansions and reductions. 88 00:05:33,809 --> 00:05:38,866 Expansion occurs when a conceptual unit is explained or disambiguated 89 00:05:38,866 --> 00:05:43,745 using a higher number of characters than those used in the source text. 90 00:05:43,745 --> 00:05:47,887 Reduction occurs when a  unit is characterised 91 00:05:47,887 --> 00:05:52,241 by partial omission and/or partial or total reformulation 92 00:05:52,241 --> 00:05:56,000 there are two kinds of reduction, omission and compression 93 00:05:56,960 --> 00:06:01,360 Reduction of the original message is necessary for two main reasons   94 00:06:02,080 --> 00:06:05,446 first to adapt the message to the reading speed of the viewers 95 00:06:05,446 --> 00:06:07,934 and secondly to avoid delay. 96 00:06:07,934 --> 00:06:13,385 If we center the attention to  alterations produced by reductions 97 00:06:13,385 --> 00:06:17,898 real time subtitlers should develop editing skills to reduce the message 98 00:06:17,898 --> 00:06:19,840 without losing key information. 99 00:06:21,440 --> 00:06:25,127 Reduction relies on two main strategies 100 00:06:25,127 --> 00:06:28,465 omission which involves deleting redundant information 101 00:06:28,465 --> 00:06:31,600 and compression which involves rewording in a shorter form. 102 00:06:33,120 --> 00:06:36,549 Omission is possible when the audience 103 00:06:36,549 --> 00:06:39,453 is supposed to have prior knowledge of the topic 104 00:06:39,453 --> 00:06:41,680 or when a speaker uses redundant words. 105 00:06:42,880 --> 00:06:45,000 Main types of omission. 106 00:06:45,000 --> 00:06:48,474 According to Romero-Fresco  the main types of words 107 00:06:48,474 --> 00:06:51,837 omitted by live subtitlers in the case of redundant words 108 00:06:51,837 --> 00:06:56,966 are discourse markers: so, well, I mean, you know. 109 00:06:56,966 --> 00:07:00,654 connectors: and, but, though 110 00:07:00,654 --> 00:07:04,809 This means that respoken sentences  are notably shorter 111 00:07:04,809 --> 00:07:07,278 than the source text sentences 112 00:07:07,600 --> 00:07:10,000 given that the absence of these conjunctions 113 00:07:10,000 --> 00:07:12,640 often entail the beginning of a new sentence. 114 00:07:14,000 --> 00:07:18,011 Also intensifiers: really, much more,  well 115 00:07:18,011 --> 00:07:22,016 or repetitions and unimportant asides: 116 00:07:22,016 --> 00:07:27,399 "you were saying" or "worth bearing in mind". 117 00:07:28,720 --> 00:07:32,430 Compression is possible when a word or a group of words 118 00:07:32,430 --> 00:07:35,463 can be substituted by a shorter equivalent 119 00:07:35,463 --> 00:07:39,087 also long grammatical structures can be substituted 120 00:07:39,087 --> 00:07:44,100 by one word prepositions, conjunctions or verbs. 121 00:07:44,771 --> 00:07:48,419 The main aim of a real-time  interlingual subtitler 122 00:07:48,419 --> 00:07:52,949 should be to make sure that subtitles are clear, accurate and accessible 123 00:07:52,949 --> 00:07:54,490 for the user. 124 00:07:54,490 --> 00:07:55,684 Summary 125 00:07:55,684 --> 00:08:01,893 As a summary we would like to stress the following points:  126 00:08:01,893 --> 00:08:04,220 The NER model assesses verbatim accuracy 127 00:08:04,220 --> 00:08:07,338 and the IRA model assesses sensatim accuracy 128 00:08:07,338 --> 00:08:12,804 output text will present alterations based on production techniques 129 00:08:12,804 --> 00:08:15,628 such as omission and compression 130 00:08:15,628 --> 00:08:18,590 and real time subtitlers have to make sure 131 00:08:18,590 --> 00:08:22,338 that subtitles are clear, accurate and accessible for the user 132 00:08:22,674 --> 00:08:24,595 Exercises 133 00:08:25,000 --> 00:08:30,000 The exercises for this video lecture are in the Trainer's Guide 134 00:08:30,000 --> 00:08:31,360 and the PowerPoint file. 135 00:08:31,444 --> 00:08:33,720 [ELECTRONIC MUSIC]